{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7btcrgth5z6f"
      },
      "source": [
        "# Getting started with LLM APIs\n",
        "\n",
        "_Thank you to [Igor Ribeiro Lima](https://github.com/igorlima) for writing this notebook!_\n",
        "\n",
        "This notebook demonstrates how to call different LLM APIs using just one set of common functions.\n",
        "\n",
        "While each LLM has its own configuration, here's a cool perk of using [`txtai`](https://neuml.github.io/txtai/) plus [`litellm`](https://docs.litellm.ai/docs/) working behind the scenes. The big win with `txtai` is that it helps standardize inputs and outputs across several LLMs. **This means we can seamlessly call any LLM API**.\n",
        "\n",
        "In this notebook, we use one set of common functions to call _**Gemini**_, _**VertexAI**_, _**Mistral**_, _**Cohere**_, _**AWS Bedrock**_, _**OpenAI**_, _**Claude by Anthropic**_, _**Groq**_, _and more_!\n",
        "\n",
        "Isn't that amazing? One method for many different LLM APIs - **that's the beauty of these libraries**!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lhrk9y766Lw1"
      },
      "source": [
        "## Install dependencies\n",
        "\n",
        "This session guides you through all the essential dependencies you'll need to run the Python script for various LLM APIs.\n",
        "\n",
        "You can choose the dependencies based on your specific needs. Each dependency is carefully commented on in the code, so you'll know exactly which API it supports.\n",
        "\n",
        "The core dependencies you'll typically need are: `txtai` and `txtai[pipeline]`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "czzRItdyN-Zj"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "# The main point here is to follow the pattern of other notebooks.\n",
        "# For more information, kindly refer to the note in the bullet right below.\n",
        "\n",
        "# !pip install txtai==8.1.0\n",
        "# !pip install txtai[pipeline]\n",
        "!pip install git+https://github.com/neuml/txtai#egg=txtai[pipeline]\n",
        "\n",
        "# to use Vertex AI\n",
        "# https://github.com/BerriAI/litellm/issues/5483\n",
        "# !pip install google-cloud-aiplatform==1.75.0\n",
        "!pip install google-cloud-aiplatform\n",
        "\n",
        "# to use AWS Bedrock\n",
        "# https://docs.litellm.ai/docs/providers/bedrock\n",
        "# !pip install boto3==1.35.88\n",
        "!pip install boto3"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "evHD8Kw09Aa0"
      },
      "source": [
        "- _**A friendly reminder**: things can change unexpectedly in the ever-evolving world of coding. One day, your code works flawlessly; the next, it might throw a tantrum for no apparent reason. That's why it's essential to specify the dependencies' versions using the latest available versions when writing the code._\n",
        "  - <sup><sub>Even recognizing the importance of version control, each version has one line above as a comment in the code. It serves as a guide to help you track dependencies and ensures the code runs smoothly across different environments, even if something goes awry.</sub></sup>\n",
        "  - <sup><sub>While there is [a trade-off](https://github.com/neuml/txtai/pull/844#issuecomment-2564294186) in limiting code to a specific version, noting today's version as a comment should help you protect against potential issues with future updates.</sub></sup>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wuCvBPCL-VQG"
      },
      "source": [
        "## LLM API Configuration\n",
        "\n",
        "This session is like a special Python dictionary that holds all the cool configurations for our LLM AI model. It includes details like environment variables, the model's name, text embedding parameters, and other fancy settings.\n",
        "\n",
        "One important key here is `IS_ENABLED`. When it's set to `True`, it's like giving the model a green light to shine! But if you ever feel like taking a break or don't need it for a while, you can easily set this key to `False`, and the model will chill out."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jyahNyOrO2yp",
        "outputId": "8143c2b0-a6a9-429b-a66f-93f39e6885e4"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "config set!\n"
          ]
        }
      ],
      "source": [
        "import os, getpass\n",
        "from txtai import LLM, Embeddings\n",
        "\n",
        "# https://neuml.github.io/txtai/install/\n",
        "# https://neuml.github.io/txtai/pipeline/text/llm/#example\n",
        "# https://neuml.github.io/txtai/embeddings/configuration/vectors/#method\n",
        "# https://docs.litellm.ai/docs/embedding/supported_embedding\n",
        "LLM_MODEL_CONFIG = {\n",
        "  'GEMINI': {\n",
        "    'IS_ENABLED': True,\n",
        "    'ENV_VAR': ['GEMINI_API_KEY'],\n",
        "    # https://docs.litellm.ai/docs/providers/gemini\n",
        "    # https://ai.google.dev/gemini-api/docs/models/gemini\n",
        "    'LLM_MODEL_NAME': 'gemini/gemini-pro',\n",
        "    # https://github.com/BerriAI/litellm/tree/12c4e7e695edb07d403dd14fc768a736638bd3d1/litellm/llms/vertex_ai\n",
        "    # https://github.com/BerriAI/litellm/blob/e19bb55e3b4c6a858b6e364302ebbf6633a51de5/model_prices_and_context_window.json#L2625\n",
        "    'TEXT_EMBEDDING_PATH': 'gemini/text-embedding-004'\n",
        "  },\n",
        "  'COHERE': {\n",
        "    'IS_ENABLED': False,\n",
        "    'ENV_VAR': ['COHERE_API_KEY'],\n",
        "    # https://docs.litellm.ai/docs/providers/cohere\n",
        "    'LLM_MODEL_NAME': 'command-light',\n",
        "    'TEXT_EMBEDDING_PATH': 'cohere/embed-english-v3.0'\n",
        "  },\n",
        "  'AWS_BEDROCK': {\n",
        "    'IS_ENABLED': False,\n",
        "    'ENV_VAR': ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_REGION_NAME'],\n",
        "    # https://docs.litellm.ai/docs/providers/bedrock\n",
        "    'LLM_MODEL_NAME': 'bedrock/amazon.titan-text-lite-v1',\n",
        "    'TEXT_EMBEDDING_PATH': 'amazon.titan-embed-text-v1'\n",
        "  },\n",
        "  'MISTRAL': {\n",
        "    'IS_ENABLED': False,\n",
        "    'ENV_VAR': ['MISTRAL_API_KEY'],\n",
        "    # https://docs.litellm.ai/docs/providers/mistral\n",
        "    'LLM_MODEL_NAME': 'mistral/mistral-tiny',\n",
        "    'TEXT_EMBEDDING_PATH': 'mistral/mistral-embed'\n",
        "  },\n",
        "  'VERTEXAI': {\n",
        "    'IS_ENABLED': False,\n",
        "    'ENV_VAR': ['GOOGLE_APPLICATION_CREDENTIALS', 'GOOGLE_VERTEX_PROJECT', 'GOOGLE_VERTEX_LOCATION'],\n",
        "    'ENV_VAR_SETUP': None,\n",
        "    # https://docs.litellm.ai/docs/providers/vertex\n",
        "    'LLM_MODEL_NAME': 'vertex_ai/gemini-pro',\n",
        "    'TEXT_EMBEDDING_PATH': 'vertex_ai/text-embedding-004'\n",
        "  },\n",
        "  'GROQ': {\n",
        "    'IS_ENABLED': False,\n",
        "    'ENV_VAR': ['GROQ_API_KEY'],\n",
        "    # https://docs.litellm.ai/docs/providers/groq\n",
        "    # https://console.groq.com/docs/models\n",
        "    'LLM_MODEL_NAME': 'groq/llama3-8b-8192',\n",
        "    # the line below has been commented out for a reason.\n",
        "    # to understand why, check out the Groq section right below.\n",
        "    # https://groq.com/retrieval-augmented-generation-with-groq-api/\n",
        "    # 'TEXT_EMBEDDING_PATH': 'jinaai/jina-embeddings-v2-base-en',\n",
        "  },\n",
        "  'OPENAI': {\n",
        "    'IS_ENABLED': False,\n",
        "    'ENV_VAR': ['OPENAI_API_KEY'],\n",
        "    # https://docs.litellm.ai/docs/providers/openai\n",
        "    # https://platform.openai.com/docs/models\n",
        "    'LLM_MODEL_NAME': 'gpt-4o-mini-2024-07-18',\n",
        "    'TEXT_EMBEDDING_PATH': 'text-embedding-3-small',\n",
        "  },\n",
        "  'CLAUDE': {\n",
        "    'IS_ENABLED': False,\n",
        "    # You might need the Voyage API key, depending on the embedding model you choose. Read more below!\n",
        "    # 'ENV_VAR': ['ANTHROPIC_API_KEY', 'VOYAGE_API_KEY'],\n",
        "    'ENV_VAR': ['ANTHROPIC_API_KEY'],\n",
        "    # https://docs.litellm.ai/docs/providers/anthropic\n",
        "    # https://docs.anthropic.com/en/docs/about-claude/models\n",
        "    'LLM_MODEL_NAME': 'anthropic/claude-3-5-haiku-20241022',\n",
        "    # the line below has been commented out for a reason.\n",
        "    # to understand why, check out the Claude Anthropic section right below.\n",
        "    # https://docs.anthropic.com/en/docs/build-with-claude/embeddings\n",
        "    # 'TEXT_EMBEDDING_PATH': 'voyage/voyage-01',\n",
        "  },\n",
        "  'VOYAGE': {\n",
        "    'IS_ENABLED': False,\n",
        "    # You might need the Anthropic API key, depending on the LLM you choose. Read more below!\n",
        "    # 'ENV_VAR': ['ANTHROPIC_API_KEY', 'VOYAGE_API_KEY'],\n",
        "    'ENV_VAR': ['VOYAGE_API_KEY'],\n",
        "    # the line below has been commented out for a reason.\n",
        "    # to understand why, check out the Voyage AI section right below.\n",
        "    # https://docs.litellm.ai/docs/providers/voyage\n",
        "    # https://docs.voyageai.com/docs/introduction\n",
        "    # 'LLM_MODEL_NAME': 'anthropic/claude-3-5-haiku-20241022',\n",
        "    #\n",
        "    # https://docs.voyageai.com/docs/embeddings\n",
        "    'TEXT_EMBEDDING_PATH': 'voyage/voyage-01',\n",
        "  }\n",
        "}\n",
        "\n",
        "import litellm\n",
        "# # https://github.com/BerriAI/litellm/blob/11932d0576a073d83f38a418cbdf6b2d8d4ff46f/litellm/litellm_core_utils/get_llm_provider_logic.py#L322\n",
        "litellm.suppress_debug_info = True\n",
        "# https://docs.litellm.ai/docs/debugging/local_debugging#set-verbose\n",
        "litellm.set_verbose=False\n",
        "\n",
        "def customsetup():\n",
        "  def vertexai():\n",
        "    # https://docs.litellm.ai/docs/embedding/supported_embedding#usage---embedding\n",
        "    # https://docs.litellm.ai/docs/providers/vertex\n",
        "    litellm.vertex_project = os.environ['GOOGLE_VERTEX_PROJECT']\n",
        "    litellm.vertex_location = os.environ['GOOGLE_VERTEX_LOCATION']\n",
        "\n",
        "  LLM_MODEL_CONFIG['VERTEXAI']['ENV_VAR_SETUP'] = vertexai()\n",
        "\n",
        "customsetup()\n",
        "\n",
        "LLM_MODELS = {k: v for k, v in LLM_MODEL_CONFIG.items() if v['IS_ENABLED']}\n",
        "ENV_VARS = [v['ENV_VAR'] for k, v in LLM_MODELS.items()]\n",
        "ENV_VARS = [x for xs in ENV_VARS for x in xs] # flatten array\n",
        "\n",
        "print(\"config set!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pgQuDv4fZgHe"
      },
      "source": [
        "### Vertex AI"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wG_2u_jKTyYg"
      },
      "source": [
        "When working with VertexAI in a Jupyter notebook, an environment variable called `GOOGLE_APPLICATION_CREDENTIALS` points to a credentials file. This variable is like giving VertexAI a secret map to find the JSON file containing your service account key.\n",
        "\n",
        "Imagine you've set `GOOGLE_APPLICATION_CREDENTIALS` to `application_default_credentials.json`. The service account key is stored in a file with that exact name. And guess what? This file must be in the same place you're working _(your current working directory)_.\n",
        "\n",
        "Here's how your directory might look:\n",
        "```\n",
        ".\n",
        "├── ..\n",
        "├── sample_data/\n",
        "└── application_default_credentials.json\n",
        "```\n",
        "\n",
        "If you use Google Colab, your directory structure will differ slightly, but don't worry! It'll look something like this:\n",
        "\n",
        "```\n",
        ".\n",
        "├── bin\n",
        "├── boot\n",
        "├── content/\n",
        "│   ├── sample_data/\n",
        "│   └── application_default_credentials.json\n",
        "├── datalab\n",
        "├── dev\n",
        "├── etc\n",
        "├── home\n",
        "├── lib\n",
        "├── lib32\n",
        "└── lib64\n",
        "```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0KsLGZFtrZVT"
      },
      "source": [
        "### Groq"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-4RUn2dUrcbZ"
      },
      "source": [
        "It's good to make the most of [free resources](https://github.com/BerriAI/litellm/issues/4922#issuecomment-2374234548) like this one! Of course, free options often come with some limitations, and this is no different. Currently, Groq doesn't offer an API for text embeddings.\n",
        "\n",
        "The Groq team highlights a pre-trained model that can be used locally. You can learn more about it on the [Groq Blog](https://groq.com/retrieval-augmented-generation-with-groq-api/), where they discuss the `jinaai/jina-embeddings-v2-base-en`, a model that can be hosted locally.\n",
        "\n",
        "This notebook primarily focuses on well-known online API models, so I've commented on the Groq line for text embedding. However, if you're curious, you can explore the model Groq recommends for local use by uncommenting it."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "I7a3d8EsrdBc"
      },
      "source": [
        "### Claude Anthropic"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RHmrKCbZriz_"
      },
      "source": [
        "While Claude Anthropic isn't a free resource, they're well-known for their top-notch LLM outputs. The only tiny hiccup for this Notebook context is that they don't offer an API for text embeddings.\n",
        "\n",
        "But don't worry. In their documentation, they recommend checking out the Voyage AI embedding model. You can find all the deets right [here](https://docs.anthropic.com/en/docs/build-with-claude/embeddings).\n",
        "\n",
        "The Claude team highlights the Voyage AI embedding model on their documentation page. Voyage is an online resource, though the Claude Anthropic team does not host it.\n",
        "\n",
        "This notebook is about well-known online API models. So, I've commented on the Claude config line for text embedding. But if you're curious and want to explore the model Claude recommends, uncomment that line and explore away!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IYpWxGBR2COD"
      },
      "source": [
        "### Voyage AI"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j6vIcO0A2FjD"
      },
      "source": [
        "Voyage AI provides only [top-notch embeddings](https://docs.voyageai.com/docs/introduction). While it doesn't offer [its LLM model](https://docs.voyageai.com/docs/embeddings), that doesn't diminish its offerings. The cool thing? Their embeddings are versatile and can be integrated into any model you choose. So, feel free to experiment and have fun with them! That's the beauty of this notebook - it's all about exploration and learning."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xZg16h4KHpE5"
      },
      "source": [
        "## Environment Variables\n",
        "\n",
        "This session has scripts to reset or set environment variables _(env vars)_. Using env vars is a great way to keep those sensitive API KEY values safe and sound, away from unwanted eyes."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Cqh5j0a4JcHg"
      },
      "source": [
        "### Script to Reset Environment Variables\n",
        "\n",
        "There are a bunch of reasons to reset the environment variable here. Let's go over a few:\n",
        "- **Switching API Keys:** Sometimes, we need to change to a different API key.\n",
        "- **Typos Happen:** We're all human, and typos can sneak in!\n",
        "- **Skipping Variables:** Sometimes we need to skip one or another, so we set the env var to empty.\n",
        "- **Fresh Start:** After tweaking a script or setup, it's always good to rerun with a fresh environment.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VejHDLF1aKub"
      },
      "outputs": [],
      "source": [
        "if 'MISTRAL_API_KEY' in os.environ:\n",
        "  del os.environ['MISTRAL_API_KEY']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "vzrNvU5XJpIm",
        "outputId": "05181b36-caa8-417d-fcd6-159e15118594"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "unset env var: GEMINI_API_KEY\n",
            "unset env var: COHERE_API_KEY\n",
            "unset env var: AWS_ACCESS_KEY_ID\n",
            "unset env var: AWS_SECRET_ACCESS_KEY\n",
            "unset env var: AWS_REGION_NAME\n",
            "unset env var: MISTRAL_API_KEY\n",
            "unset env var: GOOGLE_APPLICATION_CREDENTIALS\n",
            "unset env var: GOOGLE_VERTEX_PROJECT\n",
            "unset env var: GOOGLE_VERTEX_LOCATION\n"
          ]
        }
      ],
      "source": [
        "for ENV_VARS in [v['ENV_VAR'] for k, v in LLM_MODEL_CONFIG.items()]:\n",
        "  for ENV_VAR in ENV_VARS:\n",
        "    if ENV_VAR in os.environ:\n",
        "      del os.environ[ENV_VAR]\n",
        "    print(f'unset env var: {ENV_VAR}')\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GqZceA9qJIdq"
      },
      "source": [
        "### Script to Set Environment Variables\n",
        "\n",
        "In this session, the script will stroll through the config dictionary to cherry-pick only the enabled LLM models.\n",
        "\n",
        "Once your favorites have been gathered, the script will prompt you to set all the necessary environment variables.\n",
        "\n",
        "The more LLM models you have enabled, the more prompts you'll see - but don't worry, each environment variable will prompt only once."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "zQmXuvbsHSwR",
        "outputId": "07e94a99-817b-4fae-e649-c6e0337bb1d1"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "enter GEMINI_API_KEY: ··········\n",
            "enter COHERE_API_KEY: ··········\n",
            "enter AWS_ACCESS_KEY_ID: ··········\n",
            "enter AWS_SECRET_ACCESS_KEY: ··········\n",
            "enter AWS_REGION_NAME: ··········\n",
            "enter MISTRAL_API_KEY: ··········\n",
            "enter GOOGLE_APPLICATION_CREDENTIALS: ··········\n",
            "enter GOOGLE_VERTEX_PROJECT: ··········\n",
            "enter GOOGLE_VERTEX_LOCATION: ··········\n",
            "enter GROQ_API_KEY: ··········\n",
            "enter OPENAI_API_KEY: ··········\n",
            "enter ANTHROPIC_API_KEY: ··········\n",
            "enter VOYAGE_API_KEY: ··········\n",
            "running custom env var setup for VERTEXAI...\n",
            "done setup for VERTEXAI\n",
            "all env var set!\n"
          ]
        }
      ],
      "source": [
        "for ENV_VAR in ENV_VARS:\n",
        "  os.environ[ENV_VAR] = getpass.getpass(f\"enter {ENV_VAR}: \") if not ENV_VAR in os.environ else os.environ[ENV_VAR]\n",
        "\n",
        "LLM_MODELS_WITH_CUSTOM_SETUP = {k: v for k, v in LLM_MODELS.items() if 'ENV_VAR_SETUP' in v}\n",
        "for LLM_MODEL, LLM_CONFIG in LLM_MODELS_WITH_CUSTOM_SETUP.items():\n",
        "  print(f\"running custom env var setup for {LLM_MODEL}...\")\n",
        "  LLM_CONFIG['ENV_VAR_SETUP']()\n",
        "  print(f\"done setup for {LLM_MODEL}\")\n",
        "\n",
        "print(\"all env var set!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tnTSFJZcJW5p"
      },
      "source": [
        "# Code Snippet\n",
        "\n",
        "You can run the code snippet after installing the necessary dependencies and setting up the required environment variables!\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oPq-sHZifHFZ"
      },
      "source": [
        "## Introduction\n",
        "\n",
        "This code snippet is designed to achieve only two tasks: _(i)_ run an LLM Pipeline; _(ii)_ text Embed using the LLM Model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gKd9GB4UfJTt"
      },
      "source": [
        "The code snippet has two tasks:\n",
        "1. **Running an LLM Pipeline**:\n",
        "   - To kickstart an LLM pipeline, you'll need a prompt and a model name. It's as simple as that!\n",
        "2. **Text Embedding using the LLM Model**:\n",
        "   - To embed text with the LLM model, you'll need some text and the name of the text embedding model.\n",
        "\n",
        "In this script, we'll run an LLM pipeline for every given LLM and embed text using the provided LLM model.\n",
        "\n",
        "Here's the heart of the script:\n",
        "```python\n",
        "LLM_MODEL_NAME = \"model-name\"\n",
        "TEXT_EMBEDDING_PATH = \"text-embedding-name\"\n",
        "# A text prompt to send through the LLM pipeline\n",
        "LLM_PROMPT_INPUT = \"Where is one place you'd go in Washington, DC?\"\n",
        "# The embeddings dataset is versatile! It plays with lists, datasets, or even generators.\n",
        "EMBEDDING_DATA = [\n",
        "    \"US tops 5 million confirmed virus cases\",\n",
        "    \"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\",\n",
        "    \"Beijing mobilises invasion craft along coast as Taiwan tensions escalate\"\n",
        "]\n",
        "# LLM PIPELINE\n",
        "llm = LLM(LLM_MODEL_NAME, method=\"litellm\")\n",
        "print(llm([{\"role\": \"user\", \"content\": LLM_PROMPT_INPUT}]))\n",
        "# TEXT EMBEDDING\n",
        "embeddings = Embeddings(path=TEXT_EMBEDDING_PATH, method=\"litellm\")\n",
        "embeddings.index(EMBEDDING_DATA) # create an index for the text list\n",
        "for query in (\"feel good story\", \"climate change\"): # now, let's embark on an embeddings search for each query\n",
        "    # extract the uid of the first result\n",
        "    # search result format: (uid, score)\n",
        "    uid = embeddings.search(query, 1)[0][0]\n",
        "    # print the text\n",
        "    print(\"%-20s %s\" % (query, EMBEDDING_DATA[uid]))\n",
        "```\n",
        "\n",
        "- <sup><sub>**friendly reminder**: _the library's author often [points out](https://github.com/neuml/txtai/pull/844#issuecomment-2563561232) that it's [not necessary](https://github.com/neuml/txtai/issues/843#issuecomment-2563244810) to explicitly pass the second argument `method='litellm'`. When you're learning something new, it's okay to avoid relying on shortcuts or \"magic\" until you're more comfortable. Once you understand the library better, you can start using these convenient features to your advantage. In this introduction, I'm intentionally including the second argument `method='litellm'` in the function. However, I'm choosing to leave it out in the Playground section_.</sub></sup>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1I6o5f1ue_KX"
      },
      "source": [
        "## Playground\n",
        "\n",
        "Once you've installed the necessary dependencies and configured the environment variables, you can play with and explore the code snippet. Enjoy your coding journey! 😊\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "A3NetE3Sh0e0",
        "outputId": "d9be990a-1f57-4abc-f2be-86f8e8ef7c6c"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "--------------------------------------------------\n",
            "GEMINI\n",
            "The National Mall\n",
            "..................................................\n",
            "Query                Best Match\n",
            "..................................................\n",
            "feel good story      Maine man wins $1M from $25 lottery ticket\n",
            "climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "public health story  US tops 5 million confirmed virus cases\n",
            "war                  Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "wildlife             The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "asia                 Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "lucky                Maine man wins $1M from $25 lottery ticket\n",
            "dishonest junk       Make huge profits without work, earn up to $100,000 a day\n",
            "--------------------------------------------------\n",
            "COHERE\n",
            "As an AI chatbot, I do not have personal preferences or the ability to travel. However, I can suggest popular tourist destinations in Washington, DC, that many people enjoy visiting:\n",
            "\n",
            "1. The White House: The official residence and workplace of the President of the United States is a symbol of American democracy. Visitors can take a tour of the White House to learn about its history and see the beautiful architecture.\n",
            "\n",
            "2. National Mall: This iconic open park stretches from the United States Capitol to the Lincoln Memorial, featuring various monuments and memorials. Some notable landmarks include the Washington Monument, the Lincoln Memorial Reflecting Pool, the Vietnam Veterans Memorial, and the National World War II Memorial.\n",
            "\n",
            "3. Smithsonian Institution: The world's largest museum and research complex offers a wealth of knowledge and cultural experiences. It includes renowned museums such as the National Air and Space Museum, National Museum of Natural History, National Museum of African American History and Culture, and many more, all offering free admission.\n",
            "\n",
            "4. United States Capitol: A visit to the Capitol allows you to explore the seat of the United States Congress and learn about the legislative process. The Capitol also features impressive architecture and art, including the iconic Capitol Dome and the National Statuary Hall Collection.\n",
            "\n",
            "5. National Gallery of Art: Located on the National Mall, this renowned art museum houses a vast collection of paintings, sculptures, and other artworks from the Middle Ages to the present. It offers a chance to appreciate works by famous artists like Leonardo da Vinci, Rembrandt, and Claude Monet.\n",
            "\n",
            "These are just a few highlights, but Washington, DC, offers many more attractions, including historic sites, beautiful parks, vibrant neighborhoods, and cultural institutions, ensuring there's something for everyone to enjoy.\n",
            "..................................................\n",
            "Query                Best Match\n",
            "..................................................\n",
            "feel good story      Maine man wins $1M from $25 lottery ticket\n",
            "climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "public health story  US tops 5 million confirmed virus cases\n",
            "war                  Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "wildlife             The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "asia                 Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "lucky                Maine man wins $1M from $25 lottery ticket\n",
            "dishonest junk       US tops 5 million confirmed virus cases\n",
            "--------------------------------------------------\n",
            "AWS_BEDROCK\n",
            "I'd recommend the Smithsonian National Air and Space Museum. It is located in Washington, DC, and is the world's largest and most visited museum complex.\n",
            "..................................................\n",
            "Query                Best Match\n",
            "..................................................\n",
            "feel good story      Maine man wins $1M from $25 lottery ticket\n",
            "climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "public health story  US tops 5 million confirmed virus cases\n",
            "war                  Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "wildlife             The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "asia                 Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "lucky                Maine man wins $1M from $25 lottery ticket\n",
            "dishonest junk       Make huge profits without work, earn up to $100,000 a day\n",
            "--------------------------------------------------\n",
            "MISTRAL\n",
            "I would love to visit the Smithsonian National Museum of Natural History in Washington, DC. It's one of the most popular and largest museums in the world, with a vast array of exhibits showcasing natural history, including dinosaur fossils, mineral and gemstone collections, and marine life. I find the diversity and depth of knowledge presented in this museum fascinating. Additionally, it's free to the public, which makes it an accessible and enjoyable experience for everyone.\n",
            "..................................................\n",
            "Query                Best Match\n",
            "..................................................\n",
            "feel good story      The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "public health story  The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "war                  The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "wildlife             The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "asia                 Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "lucky                Maine man wins $1M from $25 lottery ticket\n",
            "dishonest junk       Make huge profits without work, earn up to $100,000 a day\n",
            "--------------------------------------------------\n",
            "VERTEXAI\n",
            "Washington, DC, is a fascinating city with a wealth of historical and cultural attractions. If I had the opportunity to visit, I would definitely make a stop at the Smithsonian National Air and Space Museum. This museum is home to a vast collection of aircraft and spacecraft, from the Wright brothers' first plane to the Apollo 11 command module. I am particularly interested in the history of space exploration, so I would be excited to see these iconic artifacts up close.\n",
            "\n",
            "In addition to the Air and Space Museum, there are many other places in Washington, DC, that I would like to visit. The National Mall is a beautiful park that is home to many of the city's most famous monuments, including the Washington Monument, the Lincoln Memorial, and the Vietnam Veterans Memorial. I would also like to visit the White House, the Capitol Building, and the Supreme Court. These buildings are all important symbols of American democracy, and I would be honored to have the opportunity to see them in person.\n",
            "\n",
            "Washington, DC, is a city with a rich history and a bright future. I am confident that it will continue to be a major center of culture and government for many years to come.\n",
            "..................................................\n",
            "Query                Best Match\n",
            "..................................................\n",
            "feel good story      Maine man wins $1M from $25 lottery ticket\n",
            "climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "public health story  US tops 5 million confirmed virus cases\n",
            "war                  Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "wildlife             The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "asia                 Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "lucky                Maine man wins $1M from $25 lottery ticket\n",
            "dishonest junk       Make huge profits without work, earn up to $100,000 a day\n",
            "--------------------------------------------------\n",
            "GROQ\n",
            "Washington, D.C. is a city with a plethora of amazing places to visit! But, if I had to pick just one place, I'd recommend the National Mall.\n",
            "\n",
            "The National Mall is a beautiful stretch of parkland that stretches from the Lincoln Memorial to the United States Capitol Building. It's home to some of the city's most iconic landmarks, including the Washington Monument, World War II Memorial, and Vietnam Veterans Memorial.\n",
            "\n",
            "You can take a leisurely stroll along the Mall, enjoy the scenic views of the city, and stop at one of the many museums or memorials along the way. The National Mall is also a great place to people-watch, with street performers and musicians adding to the lively atmosphere.\n",
            "\n",
            "Personally, I'd recommend starting at the Lincoln Memorial, where you can take in the stunning views of the Reflecting Pool and the Washington Monument. From there, you can walk to the World War II Memorial, where you can pay your respects to the millions of Americans who served in the war.\n",
            "\n",
            "Overall, the National Mall is a must-see destination in Washington, D.C. – and it's completely free!\n",
            "--------------------------------------------------\n",
            "OPENAI\n",
            "One iconic place to visit in Washington, DC, is the National Mall. It's home to many famous monuments and memorials, including the Lincoln Memorial, the Washington Monument, and the Vietnam Veterans Memorial. The expansive park offers a great opportunity to explore the history and culture of the nation, and it's a beautiful spot for walking, picnicking, and enjoying the iconic views.\n",
            "..................................................\n",
            "Query                Best Match\n",
            "..................................................\n",
            "feel good story      Maine man wins $1M from $25 lottery ticket\n",
            "climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "public health story  US tops 5 million confirmed virus cases\n",
            "war                  Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "wildlife             The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "asia                 Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "lucky                Maine man wins $1M from $25 lottery ticket\n",
            "dishonest junk       Make huge profits without work, earn up to $100,000 a day\n",
            "--------------------------------------------------\n",
            "CLAUDE\n",
            "One great place to visit in Washington, DC is the Smithsonian National Air and Space Museum. It's located on the National Mall and features fascinating exhibits about aviation and space exploration, including famous aircraft and spacecraft like the Wright Brothers' plane and the Apollo 11 command module. The museum is free to enter and is very popular with visitors of all ages.\n",
            "--------------------------------------------------\n",
            "VOYAGE\n",
            "..................................................\n",
            "Query                Best Match\n",
            "..................................................\n",
            "feel good story      Maine man wins $1M from $25 lottery ticket\n",
            "climate change       Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\n",
            "public health story  Maine man wins $1M from $25 lottery ticket\n",
            "war                  The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "wildlife             The National Park Service warns against sacrificing slower friends in a bear attack\n",
            "asia                 Beijing mobilises invasion craft along coast as Taiwan tensions escalate\n",
            "lucky                Maine man wins $1M from $25 lottery ticket\n",
            "dishonest junk       Make huge profits without work, earn up to $100,000 a day\n",
            "--------------------------------------------------\n"
          ]
        }
      ],
      "source": [
        "# A text prompt to run through the LLM pipeline\n",
        "# https://neuml.github.io/txtai/pipeline/text/llm/\n",
        "LLM_PROMPT_INPUT = \"Where is one place you'd go in Washington, DC?\"\n",
        "\n",
        "# The embeddings dataset is versatile! It plays with lists, datasets, or even generators.\n",
        "# https://neuml.github.io/txtai/embeddings/\n",
        "EMBEDDING_DATA = [\n",
        "  \"US tops 5 million confirmed virus cases\",\n",
        "  \"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\",\n",
        "  \"Beijing mobilises invasion craft along coast as Taiwan tensions escalate\",\n",
        "  \"The National Park Service warns against sacrificing slower friends in a bear attack\",\n",
        "  \"Maine man wins $1M from $25 lottery ticket\",\n",
        "  \"Make huge profits without work, earn up to $100,000 a day\"\n",
        "]\n",
        "\n",
        "# https://neuml.github.io/txtai/pipeline/text/llm/\n",
        "def runllm(prompt=\"\", path=None):\n",
        "  if path:\n",
        "    # A quick note: you can skip specifying the `method` argument.\n",
        "    # There's an autodetection logic designed to recognize it as a `litellm` model.\n",
        "    # \n",
        "    # llm = LLM(LLM_MODEL_NAME, method=\"litellm\")\n",
        "    # \n",
        "    llm = LLM(path)\n",
        "\n",
        "    # OR: print(llm(LLM_PROMPT_INPUT, defaultrole=\"user\"))\n",
        "    print(llm([{\"role\": \"user\", \"content\": prompt}]))\n",
        "\n",
        "# https://neuml.github.io/txtai/embeddings/\n",
        "def runembeddings(data=None, path=None):\n",
        "  if path:\n",
        "    embeddings = Embeddings(\n",
        "      path=path,\n",
        "      # a quick note: you can skip specifying the `method` argument - there is autodetection logic\n",
        "      # method=\"litellm\"\n",
        "    )\n",
        "    # create an index for the list of text\n",
        "    embeddings.index(data)\n",
        "\n",
        "    print(\".\" * 50)\n",
        "    print(\"%-20s %s\" % (\"Query\", \"Best Match\"))\n",
        "    print(\".\" * 50)\n",
        " \n",
        "    # run an embeddings search for each query\n",
        "    for query in (\"feel good story\", \"climate change\",\n",
        "        \"public health story\", \"war\", \"wildlife\", \"asia\",\n",
        "        \"lucky\", \"dishonest junk\"):\n",
        "      # extract uid of first result\n",
        "      # search result format: (uid, score)\n",
        "      uid = embeddings.search(query, 1)[0][0]\n",
        "      # print text\n",
        "      print(\"%-20s %s\" % (query, data[uid]))\n",
        "\n",
        "# Let's LOOP THROUGH each enabled LLM and embedding model.\n",
        "for LLM_MODEL, LLM_CONFIG in LLM_MODELS.items():\n",
        "  LLM_MODEL_NAME = LLM_CONFIG['LLM_MODEL_NAME'] if 'LLM_MODEL_NAME' in LLM_CONFIG else None\n",
        "  TEXT_EMBEDDING_PATH = LLM_CONFIG['TEXT_EMBEDDING_PATH'] if 'TEXT_EMBEDDING_PATH' in LLM_CONFIG else None\n",
        "  print(\"-\" * 50)\n",
        "  print(LLM_MODEL)\n",
        "\n",
        "  # https://neuml.github.io/txtai/pipeline/text/llm/\n",
        "  runllm(prompt=LLM_PROMPT_INPUT, path=LLM_MODEL_NAME)\n",
        "\n",
        "  # https://neuml.github.io/txtai/embeddings/\n",
        "  runembeddings(data=EMBEDDING_DATA, path=TEXT_EMBEDDING_PATH)\n",
        "\n",
        "print(\"-\" * 50)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "lhrk9y766Lw1",
        "wuCvBPCL-VQG",
        "pgQuDv4fZgHe",
        "0KsLGZFtrZVT",
        "I7a3d8EsrdBc",
        "xZg16h4KHpE5",
        "Cqh5j0a4JcHg",
        "oPq-sHZifHFZ"
      ],
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
